Submitting CMS Jobs
There are several methods of submitting CMS jobs to the computing resources at Purdue:
The optimal choice of a submission method depends on the nature and requirements of your task. The use cases and submission instructions for each method are given below.
CRAB jobs
CRAB (CMS Remote Analysis Builder) is a utility to submit CMSSW jobs to distributed computing resources. CRAB allows users to:
- Access Data and Monte Carlo datasets stored at any CMS computing site worldwide.
- Exploit the CPU and storage resources at CMS computing sites via Worldwide LHC Computing Grid (WLCG).
cmsRun
command). It is recommended to use CRAB for computationally intensive jobs, such as Monte Carlo generation or "skimming" AOD / MiniAOD datasets.
Instructions for submitting CRAB jobs
CMS-Connect jobs
CMS-Connect is a service designed to provide a Tier3-like environment for HTCondor analysis jobs. CMS-Connect is a complementary service to CRAB. It allows users to:
- Submit jobs to all resources available in the CMS Global Pool (CMS computing resources of WLCG).
- Submit non-cmsRun jobs that are not suitable for running in a CRAB environment.
CMS-Connect job submission can be used when a CRAB-like distributed submission is desirable, but the jobs are not regular CMSSW jobs (cmsRun
).
Instructions for submitting CMS-Connect jobs
Slurm jobs
Slurm is a job scheduler and workload manager that enables batch submission locally on Purdue computing clusters. Slurm allows users to:
- Utilize the local resources of Purdue computing clusters with lower latency compared to WLCG resources.
- Submit jobs to dedicated queues to minimize wait times.
- Design custom distributed workflows using third-party software such as Dask.
Slurm job submission is a preferred method for less computationally intensive tasks such as NanoAOD processing, producing histograms and plots, etc.
Instructions for submitting Slurm jobs
Slurm queues at Purdue clusters
To list the queues available to you on a particular cluster, type the slist
command from the terminal.
Debug and standby queues:
- All clusters have
debug
andstandby
queues with 30min and 4h time limits, respectively. - The
debug
queue is meant for quick tests, whilestandby
can be used for longer general-purpose jobs. - These queues are shared by all research groups, therefore in case of a heavy job processing load, some wait time can be expected.
- Exceptions: Hammer's
standby
queue has a time limit of 12h; Negishi does not have adebug
queue.
Dedicated queues:
- Additionally, there are dedicated queues on different clusters, as listed below.
- Please note: queues not listed here should not be relied upon, as they can be removed at any time!
Cluster | Queue name | Number of cores | Time limit | Notes |
Hammer | cms | 7400 | 14 days | standard priority |
cms-a | 3000 | 14 days | higher priority | |
Gilbreth | cms-f | 4 | 14 days | 2x Tesla V100 |
Bell | cms | 1216 | 14 days | standard priority |
gpu | 512 | 2x AMD Instinct MI50 | ||
highmem | 1024 | 1 day | 4GB RAM per core | |
multigpu | 48 | 2x AMD Instinct MI50 | ||
Negishi | cms | 4096 | 14 days | standard priority |
highmem | 768 | 1 day | 4GB RAM per core |